Skip to content

Conversation

encukou
Copy link
Member

@encukou encukou commented Dec 9, 2024

Sorry for getting to this late.

GH-126025 optimized the UTF-8 decoder with a fast find_first_nonascii function.
GH-127566 then fixed an alignment issue by switching one case to memcpy -- on all platforms, even those that used the load_unaligned helper (which does a byte-by-byte copy), to memcpy.

This switches the remaining load_unaligned call to memcpy, and removes the now-unused helper.

@methane
Copy link
Member

methane commented Dec 9, 2024

Did you benchmarked it?
I didn't use memcpy because it is much slower than manual copy.
As I remember, I tried switch-case + fixed size memcpy, but it is not as fast as manual copy on some compilers.

@methane
Copy link
Member

methane commented Dec 10, 2024

result:

gcc 13.2.0
3. 0.007522980 ns (memcpy)
3. 0.006466835 ns (switch + memcpy)
4. 0.003683150 ns (switch + shift)
5. 0.003133516 ns (switch + union)

clang 18.1.3
3. 0.007062990 ns (memcpy)
3. 0.003032249 ns (switch + memcpy)
4. 0.002390183 ns (switch + shift)
5. 0.003310016 ns (switch + union)

code:
https://github.com/methane/notes/blob/02fdadf71aed28dd7b8a512b0ff2079f22033e55/c/first_nonascii/nonascii.c#L39-L150

current implementation uses switch+union, but switch+shift might be better because we use clang for JIT.

}

// less than size_t bytes left.
assert(end - p <= SIZEOF_SIZE_T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion would fail when not PY_LITTLE_ENDIAN && HAVE_CTZ case.

                // big endian and minor compilers are difficult to test.
                // fallback to per byte check.
                break;

@encukou
Copy link
Member Author

encukou commented Dec 10, 2024

Ah, my benchmark was flawed. Sorry for the noise!

@encukou encukou closed this Dec 10, 2024
@encukou encukou deleted the memcpy-unaligned branch December 11, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants